structure-based drug design
SculptDrug : A Spatial Condition-Aware Bayesian Flow Model for Structure-based Drug Design
Zhong, Qingsong, Yu, Haomin, Lin, Yan, Shen, Wangmeng, Zeng, Long, Hu, Jilin
Structure-Based drug design (SBDD) has emerged as a popular approach in drug discovery, leveraging three-dimensional protein structures to generate drug ligands. However, existing generative models encounter several key challenges: (1) incorporating boundary condition constraints, (2) integrating hierarchical structural conditions, and (3) ensuring spatial modeling fidelity. To address these limitations, we propose SculptDrug, a spatial condition-aware generative model based on Bayesian flow networks (BFNs). First, SculptDrug follows a BFN-based framework and employs a progressive denoising strategy to ensure spatial modeling fidelity, iteratively refining atom positions while enhancing local interactions for precise spatial alignment. Second, we introduce a Boundary Awareness Block that incorporates protein surface constraints into the generative process to ensure that generated ligands are geometrically compatible with the target protein. Third, we design a Hierarchical Encoder that captures global structural context while preserving fine-grained molecular interactions, ensuring overall consistency and accurate ligand-protein conformations. We evaluate SculptDrug on the CrossDocked dataset, and experimental results demonstrate that SculptDrug outperforms state-of-the-art baselines, highlighting the effectiveness of spatial condition-aware modeling.
- Europe > United Kingdom > England > Greater Manchester > Salford (0.04)
- Europe > Denmark > North Jutland > Aalborg (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
MSCoD: An Enhanced Bayesian Updating Framework with Multi-Scale Information Bottleneck and Cooperative Attention for Structure-Based Drug Design
Xu, Long, Chen, Yongcai, Liu, Fengshuo, Peng, Yuzhong
Structure-Based Drug Design (SBDD) is a powerful strategy in computational drug discovery, utilizing three-dimensional protein structures to guide the design of molecules with improved binding affinity. However, capturing complex protein-ligand interactions across multiple scales remains challenging, as current methods often overlook the hierarchical organization and intrinsic asymmetry of these interactions. To address these limitations, we propose MSCoD, a novel Bayesian updating-based generative framework for structure-based drug design. In our MSCoD, Multi-Scale Information Bottleneck (MSIB) was developed, which enables semantic compression at multiple abstraction levels for efficient hierarchical feature extraction. Furthermore, a multi-head cooperative attention (MHCA) mechanism was developed, which employs asymmetric protein-to-ligand attention to capture diverse interaction types while addressing the dimensionality disparity between proteins and ligands. Empirical studies showed that MSCoD outperforms state-of-the-art methods on the benchmark dataset. Its real-world applicability is confirmed by case studies on difficult targets like KRAS G12D (7XKJ). Additionally, the MSIB and MHCA modules prove transferable, boosting the performance of GraphDTA on standard drug target affinity prediction benchmarks (Davis and Kiba). The code and data underlying this article are freely available at https://github.com/xulong0826/MSCoD.
- North America > United States (0.14)
- Asia > China > Guangxi Province > Nanning (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China > Zhejiang Province > Ningbo (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design
Zhang, Wei, Guo, Zekun, Xia, Yingce, Jin, Peiran, Xie, Shufang, Qin, Tao, Li, Xiang-Yang
Structure-based drug design (SBDD), which maps target proteins to candidate molecular ligands, is a fundamental task in drug discovery. Effectively aligning protein structural representations with molecular representations, and ensuring alignment between generated drugs and their pharmacological properties, remains a critical challenge. To address these challenges, we propose MolChord, which integrates two key techniques: (1) to align protein and molecule structures with their textual descriptions and sequential representations (e.g., FASTA for proteins and SMILES for molecules), we leverage NatureLM, an autoregressive model unifying text, small molecules, and proteins, as the molecule generator, alongside a diffusion-based structure encoder; and (2) to guide molecules toward desired properties, we curate a property-aware dataset by integrating preference data and refine the alignment process using Direct Preference Optimization (DPO). Experimental results on CrossDocked2020 demonstrate that our approach achieves state-of-the-art performance on key evaluation metrics, highlighting its potential as a practical tool for SBDD.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Controllable 3D Molecular Generation for Structure-Based Drug Design Through Bayesian Flow Networks and Gradient Integration
Choi, Seungyeon, Kim, Hwanhee, Park, Chihyun, Lee, Dahyeon, Lee, Seungyong, Kim, Yoonju, Park, Hyoungjoon, Kwon, Sein, Jo, Youngwan, Park, Sanghyun
Recent advances in Structure-based Drug Design (SBDD) have leveraged generative models for 3D molecular generation, predominantly evaluating model performance by binding affinity to target proteins. However, practical drug discovery necessitates high binding affinity along with synthetic feasibility and selectivity, critical properties that were largely neglected in previous evaluations. To address this gap, we identify fundamental limitations of conventional diffusion-based generative models in effectively guiding molecule generation toward these diverse pharmacological properties. We propose CByG, a novel framework extending Bayesian Flow Network into a gradient-based conditional generative model that robustly integrates property-specific guidance. Additionally, we introduce a comprehensive evaluation scheme incorporating practical benchmarks for binding affinity, synthetic feasibility, and selectivity, overcoming the limitations of conventional evaluation methods. Extensive experiments demonstrate that our proposed CByG framework significantly outperforms baseline models across multiple essential evaluation criteria, highlighting its effectiveness and practicality for real-world drug discovery applications.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.75)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Multi-domain Distribution Learning for De Novo Drug Design
Schneuing, Arne, Igashov, Ilia, Dobbelstein, Adrian W., Castiglione, Thomas, Bronstein, Michael, Correia, Bruno
To further enhance the sampling process towards distribution regions with desirable metric values, we propose a joint preference alignment scheme applicable to both flow matching and Markov bridge frameworks. Furthermore, we extend our model to also explore the conformational landscape of the protein by jointly sampling side chain angles and molecules. Small molecules are the predominant class of FDA-approved drugs with a share of 85%, and more than 95% of known drugs target human or pathogen proteins (Santos et al., 2017). At the same time, the cost and duration of the development of new drugs are skyrocketing (Simoens & Huys, 2021). This sparks increasing interest in the computational design of small molecular compounds that bind specifically to disease-associated proteins and thus reduce the amount of costly experimental testing. In recent years, the machine learning community has contributed a plethora of generative tools addressing drug design from various angles (Du et al., 2024). However, these methods typically require careful tuning of the objective function to avoid exploiting imperfect computational oracles and overly maximizing one desired property (e.g. Additionally, one often aims to design a suitable 3D binding pose along with the chemical structure of the molecule, which substantially increases the degrees of freedom. Many optimization algorithms struggle to efficiently navigate such vast design spaces. Following a different approach, probabilistic generative models learn to generate drug-like molecules directly from data (Hoogeboom et al., 2022; Vignac et al., 2022). Here, the design objectives are implicitly encoded in the training data set. While these methods may not outperform direct optimization on isolated metrics, they are well suited for the multifaceted nature of drug design as they learn "what a drug looks like" in a more general way. Once trained on sufficient high-quality data, these models can capture a more holistic picture of the molecular space compared to models optimized for a limited set of target metrics. The strength of generative modeling lies in its ability to reproduce patterns seen in the training data.
- North America > United States (1.00)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
IBEX: Information-Bottleneck-EXplored Coarse-to-Fine Molecular Generation under Limited Data
Xu, Dong, Yang, Zhangfan, Yao, Jenna Xinyi, Song, Shuangbao, Zhu, Zexuan, Ji, Junkai
Three-dimensional generative models increasingly drive structure-based drug discovery, yet it remains constrained by the scarce publicly available protein-ligand complexes. Under such data scarcity, almost all existing pipelines struggle to learn transferable geometric priors and consequently overfit to training-set biases. As such, we present IBEX, an Information-Bottleneck-EXplored coarse-to-fine pipeline to tackle the chronic shortage of protein-ligand complex data in structure-based drug design. Specifically, we use PAC-Bayesian information-bottleneck theory to quantify the information density of each sample. This analysis reveals how different masking strategies affect generalization and indicates that, compared with conventional de novo generation, the constrained Scaffold Hopping task endows the model with greater effective capacity and improved transfer performance. IBEX retains the original TargetDiff architecture and hyperparameters for training to generate molecules compatible with the binding pocket; it then applies an L-BFGS optimization step to finely refine each conformation by optimizing five physics-based terms and adjusting six translational and rotational degrees of freedom in under one second. With only these modifications, IBEX raises the zero-shot docking success rate on CBGBench CrossDocked2020-based from 53% to 64%, improves the mean Vina score from $-7.41 kcal mol^{-1}$ to $-8.07 kcal mol^{-1}$, and achieves the best median Vina energy in 57 of 100 pockets versus 3 for the original TargetDiff. IBEX also increases the QED by 25%, achieves state-of-the-art validity and diversity, and markedly reduces extrapolation error.
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Zhejiang Province > Ningbo (0.04)
- Asia > China > Jiangsu Province > Changzhou (0.04)
- (2 more...)
- North America > United States > Illinois (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)